A Comparative Impact Study of Attribute Selection Techniques on Naïve Bayes Spam Filters

نویسندگان

  • José Ramon Méndez
  • I. Cid
  • Daniel Glez-Peña
  • Miguel Rocha
  • Florentino Fernández Riverola
چکیده

The main problem of the Internet e-mail service is the massive spam message delivery. Everyday, hundreds of unwanted and unhelpful messages are received by Internet users flooding their mailboxes. Fortunately, nowadays there are different kinds of filters able to identify and automatically delete most of these messages. In order to reduce the problem dimensionality only representative attributes are selected from each e-mail using feature selection techniques. This work presents a comparison among five well-known feature selection strategies when they are applied in conjunction with four different types of Naïve Bayes classifiers. The results obtained from the experiments carried out show the relevance of choosing an appropriate feature selection technique in order to obtain accurate results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spam Detection System Combining Cellular Automata and Naïve Bayes Classifier

In this study, we focus on the problem of spam detection. Based on a cellular automaton approach and naïve Bayes technique which are built as individual classifiers we evaluate a novel method combining multiple classifiers diversified both by feature selection and different classifiers to determine whether we can more accurately detect Spam. This approach combines decisions from three cellular ...

متن کامل

Naive Bayes Spam Filtering Using Word Position Attributes

This paper explores the use of the naive Bayes classifier as the basis for personalized spam filters. Various machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using word position based attribute vectors gives very good results when tested on several publicly available corpora. The effect of various forms ...

متن کامل

Naive Bayes Spam Filtering Using Word-Position-Based Attributes

This paper explores the use of the naive Bayes classifier as the basis for personalised spam filters. Several machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using wordposition-based attribute vectors gave very good results when tested on several publicly available corpora. The effects of various forms o...

متن کامل

Naive Bayes spam filtering using word-position-based attributes and length-sensitive classification thresholds

This paper explores the use of the naive Bayes classifier as the basis for personalised spam filters. Several machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using word-position-based attribute vectors gave very good results when tested on several publicly available corpora. The effects of various forms ...

متن کامل

Web Spam Detection Using Machine Learning in Specific Domain Features

In the last few years, as Internet usage becomes the main artery of the life's daily activities, the problem of spam becomes very serious for internet community. Spam pages form a real threat for all types of users. This threat proved to evolve continuously without any clue to abate. Different forms of spam witnessed a dramatic increase in both size and negative impact. A large amount of E-mail...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008